MDEV-38891 fixed THD read access violation #4694
MDEV-38891 fixed THD read access violation #4694itzanway wants to merge 4857 commits intoMariaDB:10.6from
Conversation
Test output was affected by incompletely closed preceding connections. Make test agnostic to concurrent connections by querying information_schema.processlist only for connections that it uses. Avoid ID-CONNECTION_ID() expression, it is not deterministic.
Test was affected by incompletely closed preceding connections. Make test agnostic to concurrent connections by querying performance_schema.threads only for connections that it uses.
Embedded mode produces different thread identifiers, which makes replace_result replace unwanted occurences. Use temporary table for filtering out unwanted threads such that replace_result is not needed anymore. This is regression after 0eebe3a.
…UND mismatch) Test was affected by incompletely closed preceding connections. Make test agnostic to concurrent connections by querying performance_schema.threads only for connections that it uses.
Revert part of bead24b, which broke this test. The wait is still needed to make variables of preceding connection disappear.
Test was affected by incompletely closed preceding connections. Make test agnostic to concurrent connections by querying performance_schema.user_variables_by_thread only for connections that it uses.
Test was affected by incompletely closed preceding connections. Make test agnostic to concurrent connections by querying performance_schema.events_statements_current only for connections that it uses.
Similar to MDEV-37483, use file name encoding for dbnames to create directories. Adapt mariadb-import to convert the names back.
…ecution Routine name resolution performed on a temporary memory root during execution. So on the second execution LEX::spname members pointed to a cleared memory. Fixing to peform the resolution to parse time, like the CALL statement does. During the execution time LEX::spname members now stay untouched.
Send ok packet earlier for SELECT queries that does not have any updates. This is done in select_send::send_eof() Axel's select benchmarks shows that this has a notable speed improvement: 1 Thread: 28% TPS speedup 8 Threads: 23% 64 Threads 6% This was meassured with t_oltp_point_select With InnoDB running the client over sockets. Other things: - Moved error reporting of LIMIT ROWS EXAMINED from handle_select() to check_limit_rows_examined(). This is to ensure that the error is reported before send_eof() is called. - Removed duplicate "Query execution was interrupted" messages for the same query. killed_for_exceeding_limit_rows_warning_given was introduced for this purpose. We cannot use 'killed' flag in killed_for_exceeding_limit_rows() to detect if we already have produced a warning as the 'killed' in this case is reset for each union. Reviewer: Sergei Golubchik <[email protected]>
correct the length check. remove assertions that a file read from disk contains a specific substring
…e-level CREATE but not with global CREATE CHECK TABLE was inconsistently requiring SELECT privilege on global/db level or any privilege on the table/column level. Change to require any table-applicable privilege on any level.
don't trust the content of a file read from disk
… queries to fail don't copy field default values and check constraints in CREATE ... SELECT. CREATE ... SELECT means a table is created from a *result set* not from some other table. For backward compatibility, though, let's keep copying constant default values and the "compressed" attribute.
…DEFAULT VALUE() should only use table->insert_values when table->insert_values contains row values. table->insert_values gets row values for the ODKU clause so if VALUE() is used before that it shouldn't use table->insert_values
…sions 10.11.X and beyond my_getcputime() returns "cpu time in 1/10th on a microsecond (1e-7 s)"
otherwise it causes random failures in some later test that lists files in $datadir/test
don't let the parser create ridiculously deep joins that will be rejected later anyway
the "Test that bad value for plugin enum option is rejected correctly" needed multiple fixes: 1. don't set plugin-dir based on $MYSQLTEST_VARDIR, all plugins are in var/plugins, but $MYSQLTEST_VARDIR is var/1/, var/2/, etc if --parallel is used (that is, practically always), thus the ha_example.so cannot be loaded, because cannot be found. Test fails with "unknown option --plugin-example-enum" as the plugin is not loaded 2. force --plugin-maturity=experimental, otherwise even if not parallel the plugin will fail to load because of low maturity, test still fails with "unknown option --plugin-example-enum" 3. don't specify .so extension explicitly otherwise the plugin still doesn't load on windows, even if paths and maturity are fixed 4. set --plugin-example=FORCE otherwise plugin fails to load after reading --plugin-example-enum-var=noexist because of unknown enum value, the server ignores the failure and starts normally. the test hangs. 5. This needs the fix in sql_plugin.cc to detect that the plugin is forced even when some options failed to parse. It used to consider plugin forced only if all options parsed correctly, which was wrong. Now the test passes, testing what it was supposed to test - failure to parse an enum value of a plugin option. Without these fixes the test hanged as in 4 when run on the main branch in non-prarallel (e.g. one test only) mode.
…keys on vault errors * let use_cache_on_timeout apply to other errors * enable use_cache_on_timeout by default and deprecate it * increase cache_timeout to max and deprecate it * change it from long to portable longlong * delete both in 13.3
* put autocommit/commit outside of LOCK/UNLOCK. * use uppercase like all other commands * restore the old value of autocommit
with `ORDER BY number` if the number doesn't refer to a valid result column, use this number in the error message not '???'.
restore OPTIMIZE/ANALYZE replication under @read_only that was disabled in b62101f
apparently a file can be present in *more than one* rpm, e.g. /usr/bin/dtrace on rhel10 is present *both* in systemtap-sdt-dtrace and in systemtap-sdt-devel. Make sure there's a separator between entries.
don't use such a greedy regex_replace pattern
event scheduler was printing a lot of info in [Note] in error log. change to print its startup/shutdown notes only when log_warnings>0. and runtime notes only when log_warnings>2. one note was an abnormal error, change to [Error].
if ((res=item->val_str(str)) != str) is incorrect way to detect whether res can be safely used, because Item_char_typecast::val_str() can return res which is different from str, but shares the same buffer.
Master_Server_Id was not cleared after CHANGE MASTER or RESET SLAVE, showing a stale value until the slave reconnected. Reset master_id and prev_master_id to 0 in both code paths. The reset value (0) will be present in SHOW SLAVE STATUS until it is re-evaluated to the id of a new connected master server. Signed-off-by: Varun Deep Saini <[email protected]>
There was a problem hiding this comment.
Ok, here some points.
-
there is a lot of unrelated whitespace changes. Please do not include them into the patch.
-
if THD can't be created, at startup, write a warning to error log (@dr-m can tell how to do it best for Innodb, but there is sql_print_warning), but do not create the timer, so that dict_stats_func is not going to be periodically called, if it can't do anything. You do not need to check if thd is NULL, you can DBUG_ASSERT() that it is not NULL.
-
I think , and ask @dr-m to confirm, that whatever Innodb is doing with
THDVAR(thd, background_thread) = true;is unnecessary. While harmless at startup, and can bite at shutdown, ininnodb_reset_background_thd()inut_ad(THDVAR(thd, background_thread));. To check if it is not a foreground thread,thd->thread_id == 0check is entirely sufficient, yes , server ensures that, yes, even then thread_ids wrap around, when 4 bytes ints are exhausted. I would remove that thing in fact, too. together withthd_proc_info(thd, name);, I have no idea what purpose does it serve, the background THDs are invisible from outside. I'd say, these wrappers around create_background_thd() and reset_thds() are obsolete, but would prefer to have @dr-m to confirm it. -
While bug was reported in the main branch, bug it mostly likely exists in all previous versions. Which means, fix could need to be rebased to 10.6 or 10.11, or whatever @gkodinov will decide. As far as I can see it, this bug might not be of a huge importance. From my limited my understanding and very short investigation, is that it only happens when server was not going to start anyway, otherwise, in case of successful startup, dict_stats cleanup would happen earlier, at ha_preshutdown/innodb_preshutdown stage, but I'd again let @dr-m confirm that. UPDATE: ok, this can happen at any bootstrap, unlike normal startup, it does not run ha_preshutdown currently.
dr-m
left a comment
There was a problem hiding this comment.
I believe that MDEV-38891 affects all currently maintained MariaDB Server branches. Therefore, a fix of this should target the 10.6 branch.
The real challenge is to reproduce the issue, possibly by injecting some sleep within DBUG_EXECUTE_IF. I would like to see a test case (even a nondeterministic one that fails once in N attempts) and some analysis of what is going on and how the fix is supposed to prevent the problem.
| destroy_background_thd(dict_stats_thd); | ||
| dict_stats_thd = nullptr; | ||
| } | ||
| } |
There was a problem hiding this comment.
By convention, our source code files end in a line terminator.
There was a problem hiding this comment.
My editor stripped it out by accident. I've added the missing newline terminator in the latest commit
itzanway
left a comment
There was a problem hiding this comment.
Thanks for the review! I've updated the PR to address these points:
Cleaned up the unrelated whitespace changes.
Added a check in dict_stats_start(): if THD creation fails, it now logs a warning via sql_print_warning and skips creating the timer.
Replaced the if (!dict_stats_thd) check in dict_stats_func() with DBUG_ASSERT(dict_stats_thd != nullptr).
|
@vaintroub |
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for working on this! This is a preliminary review.
There's already a very active review discussion: please keep working on that.
These are the extra points I see:
- Since this is a crashing bug, I'd agree with Marko that this should be rebased to 10.6. Please rebase.
- Please use a single commit with a commit message conforming to CODING_STANDARDS.md
- Please make sure that buildbot runs pass. Right now some platforms don't even compile
|
@vaintroub, I think that you are right about |
|
I have seen it once after running test suite hundreds of times, and luckily got it a the debugger. it is innodb_bootstrap test, and I can't say anything more about it. I reported the error as community service, the rest, as for reproducibility, is up to Innodb team. The thing that is failing is bootstrap "mariadbd --bootstrap" |
gkodinov
left a comment
There was a problem hiding this comment.
Any specific reason you've re-requested review from me? I have nothing new to add and nothing in the PR has changed. So please work on the remarks in my previous review.
Add support for reversed executable comments using /*!!version */ and /*M!!version */ syntax. These execute the comment body only when the server version is strictly less than the specified version, which is the inverse of the existing /*!version */ syntax. This enables writing portable SQL that uses newer syntax on new servers while falling back to older syntax on older servers, e.g.: CREATE /*!100000 OR REPLACE */ TABLE /*!!100000 IF NOT EXISTS */ t1 ... On MariaDB >= 10.0 this expands to CREATE OR REPLACE TABLE t1, while on older versions it expands to CREATE TABLE IF NOT EXISTS t1. Implementation: in lex_one_token(), after detecting a versioned comment (/*! or /*M!), check for an additional '!' character. If present, invert the version comparison so the comment body is expanded only when MYSQL_VERSION_ID < version. Tests added to main.comments and plugins.server_audit confirming: - Reversed comments with version <= server version do not execute - Reversed comments with version > server version execute - MariaDB-specific /*M!! variant works correctly - Reversed comments without a version number always execute - Audit plugin correctly logs executed reversed comments - Combined forward + reversed comments in a single statement All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
There is a lot of more work needed to make this code compatitble with the rest of the MariaDB server. - All std:: usage should be removed - Most templatest should be removed - Usage of <unordered_map>, <string_view>, <optional>, <unordered_set> should be removed and replaced with my_sys, sql_string functions It would be nice of one could also have default values for all the CHANGE MASTER variables in my.cnf. This would allow one to have much shorter CHANGE MASTER commands
|
@dr-m @vaintroub I have resolved the merge conflicts and ported this fix to the 10.6 branch. I also addressed the C++ nitpicks: Removed the redundant if check around delete dict_stats_timer. Restored the missing newline terminator at the end of dict0stats_bg.cc. Finally, I added a deterministic test case (mdev_38891.test) using DBUG_EXECUTE_IF. By injecting a 2-second sleep into dict_stats_func, we can simulate the exact race condition where the thread wakes up during early shutdown while the plugin system is being destroyed. The test reliably crashes an unpatched server and passes cleanly with this fix applied." |
…ns (fix) Correct compilation on clang-20 by using a bitwise & rather than the incorrect &&.
No description provided.